Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            With the rapid advance in Deep Neural Networks (DNNs), GPU’s role as a hardware accelerator becomes increasingly important. Due to the GPU’s significant power consumption, developing high- performance and power-efficient GPU systems is a critical challenge. DNN applications need to move a large amount of data between memory and the processing cores, which consumes a great amount of NoC power. However, prior proposed lossless data compressions cannot achieve optimal performance and energy efficiency because they did not take advantage of the error resilience of DNNs. In this work, we propose an NoC architecture that can reduce power consumption without compromising performance and accu- racy. Our technique takes advantage of the error resilience of DNNs as well as the data locality in the floating-point data representation of DNNs. Each data packet is reorganized by grouping data with similar bits such as in the exponents, and redundant bits are sent only once. We further compress the mantissa fields by appropri- ately selecting "proxy" values for data sharing the same exponent. Our evaluation results show that the proposed technique can ef- fectively reduce the amount of data transmitted and lead to better performance and power trade-offs while preserving accuracy.more » « lessFree, publicly-accessible full text available June 30, 2026
- 
            Free, publicly-accessible full text available June 24, 2026
- 
            Excitation transfer across the interfaces between graphene, perylenetetracarboxylic diimide (PTCDI), and titanyl phthalocyanine (TiOPc) was studied by using transient absorption and photoluminescence spectroscopy. Both photoluminescence quenching and transient absorption measurements confirm the presence of a type-II interface between PTCDI and TiOPc. While the graphene/PTCDI interface is expected to exhibit type-I behavior, transient absorption measurements indicate that only electrons transfer from PTCDI to graphene, with no evidence of hole transfer. Density functional theory calculations reveal significant ground-state electron transfer from graphene to PTCDI, resulting in band bending that prevents excited holes from transferring from PTCDI to graphene. This feature is exploited in a trilayer heterostructure of graphene/PTCDI/TiOPc, where the spatial separation of photoexcited electrons and holes in graphene and TiOPc, respectively, leads to the formation of long-lived photoexcitations with a lifetime of approximately 500 ps. Furthermore, spatially resolved transient absorption measurements reveal the immobile nature of these excitations, confirming that they are charge-transfer excitons rather than free electrons and holes. These results provide valuable insights into the complex interlayer photoexcitation transfer properties and demonstrate precise control over the layer population and the recombination lifetime of photocarriers in such hybrid heterostructures.more » « lessFree, publicly-accessible full text available April 10, 2026
- 
            As the Next-Generation Sequencing (NGS) techniques need to process enormous amounts of data, cost-efficientfand high-throughput computational analysis is essential in genomicsfstudy. Conventional computing platforms face great challenges to meet these demands due to their limited processing speed and scalability. Hardware accelerators, such as Graphics Processing Units (GPUs), Field-Programmable Gate Arrays (FPGAs), and Application-Specific Integrated Circuits (ASICs), offer transformative solutions to these computational challenges. This paper provides a state-of-the-art review of the roles of hardware accelerators in genomic analysis.We performed a comprehensive and in-depth analysis of cutting-edge genomics hardware accelerators, such as GPUs, FPGAs, and ASICs, in the context of the specific algorithms they aim to enhance. Besides reviewing opportunities in hardware genome acceleration, we also provide insights into the challenges regarding processing speed, cost efficiency, and scalability.more » « lessFree, publicly-accessible full text available December 16, 2025
- 
            We report experimental evidence that MoSe2 and WS2 allow the formation of type-I and type-II interfaces, according to the thickness of the former. Heterostructure samples are obtained by stacking a monolayer WS2 flake on top of a MoSe2 flake that contains regions of thickness from one to four layers. Photoluminescence spectroscopy and transient absorption measurements reveal a type-II interface in the regions of monolayer MoSe2 in contact with monolayer WS2. In other regions of the heterostructure formed by multilayer MoSe2 and monolayer WS2, features of type-I interface are observed, including the absence of charge transfer and dominance of intralayer excitons in MoSe2. The coexistence of type-I and type-II interfaces in a single heterostructure offers opportunities to design sophisticated two-dimensional materials with finely controlled photocarrier behaviors.more » « lessFree, publicly-accessible full text available January 27, 2026
- 
            In recent years, Network-on-Chip (NoC) has emerged as a promising solution for addressing a critical performance bottleneck encountered in designing large-scale multi-core systems, i.e., data communication. With advancements in chip manufacturing technologies and the increasing complexity of system designs, the task of designing the communication sub- systems has become increasingly challenging. The emergence of hardware accelerators, such as GPUs, FPGAs and ASICs, together with heterogeneous system integration of the CPUs and the accelerators creates new challenges in NoC design. Conventional NoC architectures developed for CPU-based multi- core systems are not able to satisfy the traffic demands of heterogeneous systems. In recent years, numerous research efforts have been dedicated to exploring the various aspects of NoC design in hardware accelerators and heterogeneous systems. However, there is a need for a comprehensive understanding of the current state-of-the-art research in this emerging research area. This paper aims to provide a summary of research work conducted in heterogeneous NoC design. Through this survey, we aim to present a comprehensive overview of the current related research, highlighting key findings, challenges, and future directions in this field.more » « lessFree, publicly-accessible full text available December 16, 2025
- 
            Heterogeneous chiplets have been proposed for accelerating high-performance computing tasks. Integrated inside one package, CPU and GPU chiplets can share a common interconnection network that can be implemented through the interposer. However, CPU and GPU applications have very different traffic patterns in general. Without effective management of the network resource, some chiplets can suffer significant performance degradation because the network bandwidth is taken away by communication-intensive applications. Therefore, techniques need to be developed to effectively manage the shared network resources. In a chiplet-based system, resource management needs to not only react in real-time but also be cost-efficient. In this work, we propose a reconfigurable network architecture, leveraging Kalman Filter to make accurate predictions on network resources needed by the applications and then adaptively change the resource allocation. Using our design, the network bandwidth can be fairly allocated to avoid starvation or performance degradation. Our evaluation results show that the proposed reconfigurable interconnection network can dynamically react to the changes in traffic demand of the chiplets and improve the system performance with low cost and design complexity.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
